TwiSty: A Multilingual Twitter Stylometry Corpus for Gender and Personality Profiling
نویسندگان
چکیده
Personality profiling is the task of detecting personality traits of authors based on writing style. Several personality typologies exist, however, the Myers-Briggs Type Indicator (MBTI) is particularly popular in the non-scientific community, and many people use it to analyse their own personality and talk about the results online. Therefore, large amounts of self-assessed data on MBTI are readily available on social-media platforms such as Twitter. We present a novel corpus of tweets annotated with the MBTI personality type and gender of their author for six Western European languages (Dutch, German, French, Italian, Portuguese and Spanish). We outline the corpus creation and annotation, show statistics of the obtained data distributions and present first baselines on Myers-Briggs personality profiling and gender prediction for all six languages.
منابع مشابه
Gender Profiling for Slovene Twitter communication: the Influence of Gender Marking, Content and Style
We present results of the first gender classification experiments on Slovene text to our knowledge. Inspired by the TwiSty corpus and experiments (Verhoeven et al., 2016), we employed the Janes corpus (Erjavec et al., 2016) and its gender annotations to perform gender classification experiments on Twitter text comparing a token-based and a lemma-based approach. We find that the token-based appr...
متن کاملAuthor Profiling of Twitter Users: Notebook for PAN at CLEF 2015
In this paper, we focused on profiling authors on age, gender, and five personality traits. The corpus consists of anonymized twitter posts categorized into 4 different languages. Our proposed approach was to use a combination of tfidf, function words, stylistic features, and text bigrams, and used an SVM for each task.
متن کاملIdentification of Author Personality Traits using Stylistic Features: Notebook for PAN at CLEF 2015
Author profiling is the task of determining the age, gender or type of the author's personality by studying their sociolect aspect, that is, how the language is shared by people. This paper presents the COMSATS Institute of Information Technology, Lahore entry for the PAN 2015 competition on Author Profiling task. Our proposed system is based on stylometry features. We implemented 29 different ...
متن کاملAuthor Profiling using Stylometric and Structural Feature Groupings
In this paper we present an approach for the task of author profiling. We propose a coherent grouping of features combined with appropriate preprocessing steps for each group. The groups we used were stylometric and structural, featuring among others, trigrams and counts of twitter specific characteristics. We address gender and age prediction as a classification task and personality prediction...
متن کاملA Survey on Authorship Profiling Techniques
Authorship analysis is a text analysis technique that is visualized mainly in three different techniques namely Authorship Profiling, Authorship Identification and Plagiarism Detection. In this paper a brief survey on the recent developments in the area of author profiling approaches were presented. Authorship Profiling is to ascertain various authors characteristics like age, gender, native co...
متن کامل